Skip to content

Conversation

@kito-cheng
Copy link
Collaborator

@kito-cheng kito-cheng commented Aug 11, 2025

Various RISC-V extensions reuse existing instruction encoding spaces
for specialized behaviors. CFI extensions use NOP or MOP (Maybe Operation)
instructions to ensure correct program execution even when hardware
doesn't implement CFI. Similarly, hint extensions redefine specific
encodings for performance optimization purposes.

However, this design conflicts with current toolchain conventions,
where X extension instructions are only usable when the X extension
is enabled: assemblers accept corresponding mnemonics and disassemblers
decode instructions only when the extension is active.

The affected use cases require different behavior:

  • Zicfiss instructions should be available when Zimop extension is present
  • Zicfilp instructions should be usable even without explicit enabling
    (since auipc is in baseline ISA)
  • Hint extensions should reinterpret specific encodings regardless of
    extension enablement

Following conventional toolchain behavior would force compilers to
generate generic instructions ('mop', 'fence w,0', 'auipc x0, ')
instead of extension-specific mnemonics ('sspush/sspop', 'pause',
'lpad '), causing user confusion and reducing code clarity.

While this could be addressed at the ISA specification level, such
changes involve complex and time-consuming procedures. Therefore,
this psABI defines Tag_RISCV_mop_and_hint_encoding to provide toolchain
implementations with clear guidance.

This tag uses a bitmap format where each bit indicates whether specific
instruction encodings should be reinterpreted:

  • Bit 0: auipc x0, as lpad (Zicfilp)
  • Bit 1: Zimop encodings as Zicfiss instructions
  • Bit 2: fence w,0 as pause hints (Zihintpause)
  • Bit 3: specific encodings as non-temporal locality hints (Zihintntl)

The tag uses bitwise OR merge policy to allow combining multiple
reinterpretation requirements, with toolchain warnings recommended
when encoding space conflicts occur.

@kito-cheng
Copy link
Collaborator Author

@ved-rivos @aswaterman this is what we discussed in the mail before, but a little bit late ...:P

@kito-cheng
Copy link
Collaborator Author

cc @mylai-mtk @deepak0414 @jaidTw

@mylai-mtk
Copy link

We already have the .note.gun.property section that marks the adoption of Zicfilp/Zicfiss, and I believe that this section could be and would be generated if the CFI features are enabled in the compiler/linker invocations that created the object files, since the section could just be ignored by platforms/tools that don't recognize it. With this existing mechanism, why bother creating a new data field to mark the adoption of the CFI features?

@kito-cheng
Copy link
Collaborator Author

We already have the .note.gun.property section that marks the adoption of Zicfilp/Zicfiss, and I believe that this section could be and would be generated if the CFI features are enabled in the compiler/linker invocations that created the object files, since the section could just be ignored by platforms/tools that don't recognize it. With this existing mechanism, why bother creating a new data field to mark the adoption of the CFI features?

Because .note.gun.property may not set due to some input object may not enabled CFI code gen, and then disassemble zicfiss/zicfilp to zimop/zcmop/auipc x0.

Also for the assembler side, this could let user use lpad/sspush/sspop(and all other ss instruction) without enable zicfiss/zicfilp.

The goal is make the toolchain behavior consistent, so adding a new marker let toolchain know we are using a special convention here (rather than just introduce a special rule).

@mylai-mtk
Copy link

mylai-mtk commented Aug 11, 2025

While this could be addressed at the ISA specification level, such changes involve complex and time-consuming procedures. Therefore, this psABI defines Tag_RISCV_cfi_encoding to provide toolchain implementations with clear guidance.

Would there be plans to incorporate some kind of framework at the ISA level? Based on the PR description, the current conflicts only arise from toolchain conventions, so I think maybe this psABI place is the right place to solve this issue? Since after all the problems arise from toolchain implementations and decisions, which is technically independent from ISA.

If this is the right place to regulate the usages of these kind of "always-on" extensions, would it be better to provide a more generic framework than the currently proposed one that serves only the CFI features?

@kito-cheng
Copy link
Collaborator Author

Would there be plans to incorporate some kind of framework at the ISA level? Based on the PR description, the current conflicts only arise from toolchain conventions, so I think maybe this psABI place is the right place to solve this issue? Since after all the problems arise from toolchain implementations and decisions, which is technically independent from ISA.

If this is the right place to regulate the usages of these kind of "always-on" extensions, would it be better to provide a more generic framework than the currently proposed one that serves only the CFI features?

Another case is hint instructions (e.g. Zihintpause and Zihintntl). I originally thought they were different from the CFI situation, but yeah - they’re actually pretty similar. I'm still not like about adding a separate bit for every special case, but given what you said about the potential need to reallocate opcode space, having separate bits does sound like a more future-proof approach.

So let me update and made this PR more generic.

Various RISC-V extensions reuse existing instruction encoding spaces
for specialized behaviors. CFI extensions use NOP or MOP (Maybe Operation)
instructions to ensure correct program execution even when hardware
doesn't implement CFI. Similarly, hint extensions redefine specific
encodings for performance optimization purposes.

However, this design conflicts with current toolchain conventions,
where X extension instructions are only usable when the X extension
is enabled: assemblers accept corresponding mnemonics and disassemblers
decode instructions only when the extension is active.

The affected use cases require different behavior:
- Zicfiss instructions should be available when Zimop extension is present
- Zicfilp instructions should be usable even without explicit enabling
  (since auipc is in baseline ISA)
- Hint extensions should reinterpret specific encodings regardless of
  extension enablement

Following conventional toolchain behavior would force compilers to
generate generic instructions ('mop', 'fence w,0', 'auipc x0, <value>')
instead of extension-specific mnemonics ('sspush/sspop', 'pause',
'lpad <value>'), causing user confusion and reducing code clarity.

While this could be addressed at the ISA specification level, such
changes involve complex and time-consuming procedures. Therefore,
this psABI defines Tag_RISCV_mop_and_hint_encoding to provide toolchain
implementations with clear guidance.

This tag uses a bitmap format where each bit indicates whether specific
instruction encodings should be reinterpreted:
- Bit 0: auipc x0, <value> as lpad <value> (Zicfilp)
- Bit 1: Zimop encodings as Zicfiss instructions
- Bit 2: fence w,0 as pause hints (Zihintpause)
- Bit 3: specific encodings as non-temporal locality hints (Zihintntl)

The tag uses bitwise OR merge policy to allow combining multiple
reinterpretation requirements, with toolchain warnings recommended
when encoding space conflicts occur.
@kito-cheng kito-cheng force-pushed the kitoc/Tag_RISCV_mop_cfi branch from 632954d to 54ac0a2 Compare August 12, 2025 01:45
@kito-cheng kito-cheng changed the title Add Tag_RISCV_cfi_encoding for CFI instruction interpretation Add Tag_RISCV_mop_and_hint_encoding for instruction reinterpretation Aug 12, 2025
@kito-cheng
Copy link
Collaborator Author

Changes:

  • Change to bitmap rather than just 0 or 1
  • Included Zihintpause and Zihintntl

@deepak0414
Copy link

Sounds like a good idea.
Consumer of this would be anything which is decoding instructions (disassembler, instruction/bin walking, etc)

Suggestion1: Instead of Tag_RISCV_mop_and_hint_encoding, I think better name would be Tag_RISCV_Insn_Reinterpret.

Suggestion2: Keep the highest bit reserved and consume/set it when we run out of 127 bits :-)

@kito-cheng
Copy link
Collaborator Author

Suggestion1: Instead of Tag_RISCV_mop_and_hint_encoding, I think better name would be Tag_RISCV_Insn_Reinterpret.

I get your point about Tag_RISCV_Insn_Reinterpret being more general.
But for now this tag is really just about MOP/HINT, so I’d rather keep the
name Tag_RISCV_mop_and_hint_encoding to make that clear :)

Suggestion2: Keep the highest bit reserved and consume/set it when we run out of 127 bits :-)

uleb128 encoding is variable length encoding, so we have infinite bits can be use in theory.

@kito-cheng
Copy link
Collaborator Author

Change:

  • Added note that disassemblers rely on provided hints to correctly decode Maybe instructions and related encodings.
  • Update wording

@mylai-mtk
Copy link

@kito-cheng

Actually I would also suggest the name of Tag_insn_reinterpret if @ deepak0414 had not made the suggestion.

In my opinion, the "mop_and_hint_encoding" name is too limiting. Take Zicfilp LPAD insn as the negative case: it's not encoded in MOP and also not a hint[1] but will be affected by "mop_and_hint_encoding". The name doesn't really suit the scenario here.

Please do consider the name, even though it may leave some unintended juggling space.

[1]:

HINTs do not change any architecturally visible state, except for advancing the pc and any applicable performance counters.
Implementations are always allowed to ignore the encoded hints.

-- The RISC-V Instruction Set Manual Volume I Unprivileged Architecture Version 20250508

@kito-cheng
Copy link
Collaborator Author

@mylai-mtk
Copy link

@kito-cheng

lpad is hint according the spec I think?

If LPAD is a HINT, then I'm confused. LPAD surely modifies architecturally visible state (the ELP bit in CSRs), and implementation that implements it are not allowed to ignore it in that implementation must check reg x7 == uimm20 or raise an exception. I would argue that LPAD does not really qualify as a HINT insn, even though it uses a HINT insn encoding point.

It's a loop-hole in the ISA description about HINTs. I think the problem is: HINT encoding points are not necessarily used as HINT insns (HINT insns do not change any architecturally visible state, except for advancing the pc and any applicable performance counters, and could be ignored by implementation). These HINT encoding points could be used to encode extension insns that have architecturally observable effects and thus could not be ignored if the implementation claims to implement the extension.

But anyway, at least the auipc x0, uimm20 encoding point is a listed HINT encoding point. The name "mop_and_hint_encoding" fits Zicfilp in this sense, so I'll drop my opinion about name changing.

@kito-cheng
Copy link
Collaborator Author

In the last psABI meeting, we discussed this issue and the conclusion was to NOT add this tag, but instead handle it directly in the toolchain.
The main reason is to avoid giving the impression that this problem has solution and we could reuse for hint or mop encodings.

So the next step is to implement it in the upstream open-source toolchain and continue discussion there.

This PR will stay open until the upstream side is resolved.

@mylai-mtk
Copy link

@kito-cheng

In the last psABI meeting, we discussed this issue and the conclusion was to NOT add this tag, but instead handle it directly in the toolchain. The main reason is to avoid giving the impression that this problem has solution and we could reuse for hint or mop encodings.

So the next step is to implement it in the upstream open-source toolchain and continue discussion there.

This PR will stay open until the upstream side is resolved.

Excuse me, but I need a clearer description than this. The quoted description does not reveal or hint at the proposed way to drive this PR's vision forward.

The main reason is to avoid giving the impression that this problem has solution

I guess the "problem" here is this PR's vision? (which I would write down as "in assembler/disassembler, to encode/display MOP/hint insns as if a specific ISA extension is enabled, even though that ISA extension is not formally enabled and not listed in Tag_RISCV_arch"). If so, by saying "avoid giving the impression that this problem has solution", I assume that in the psABI meeting, the conclusion is that this "problem" (or as I'd like to call it, "feature") would not be addressed by the standard, i.e. it has no standard "solution", and the toolchain could do whatever they want with it, which includes the options of not implementing the "feature" at all or implementing it with some non-standard (but perhaps agreed by both gnu binutils and llvm) method. Am I correct in this understanding?

If my understanding is correct, then "the next step is to implement it in the upstream" means not/stop implementing this "feature" at all or implementing it in whatever way the toolchain likes it?

avoid giving the impression that ... and we could reuse for hint or mop encodings.

Does this mean that in the meeting, the conclusion is that MOP/hint encoding spaces could not be reused?
(In the quoted sentence, what is the "reuse" the meeting avoids? We usually use the verb "reuse" in this structure: "reuse something for some purpose". What is the "something" in your sentence?)

@jrtc27
Copy link
Collaborator

jrtc27 commented Sep 15, 2025

I publish public minutes from all psABI meetings; in this case, you can read a summary of the conversation at https://github.com/riscv-admin/psabi/blob/master/MINUTES/2025/meeting-20250911.adoc#status-update-for-cfi-related-prs

@mylai-mtk
Copy link

After reading the meeting minutes, I'll answer my own questions:

Does this mean that in the meeting, the conclusion is that MOP/hint encoding spaces could not be reused?

The MOP/hint encoding spaces are strongly discouraged to be reused. The ecosystem (toolchains included) would not guarantee correctness in the case of reuses. ("Philip: Will go stronger. Changing it will not work.")

If my understanding is correct, then "the next step is to implement it in the upstream" means not/stop implementing this "feature" at all or implementing it in whatever way the toolchain likes it?

Since encoding space reuses are not welcomed, the meeting minute recommends implementing the "feature" as always-on, i.e. these encodings are always interpreted with the implicit (not listed in Tag_RISCV_arch) ISA extensions. ("Philip: Should just always enable recognising e.g. LPAD in the assembler and disassembler, or at least on by default. ; Sam: Agree with that. ;")

(The meeting minutes are just dialogues from the meeting, and is largely incomplete in the sense of context and conclusion, but I think my understanding based on the reading aligns with the conclusion reached in the meeting as relayed by @ kito-cheng .)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants